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Abstract 

We compared the properties of the error threshold transition in quasispecies evolu- 
tion for three different topologies of the genome space. They are a) hypercube b) 
rugged landscape modelled by an ultrametric space, and c) holey landscape modelled 
by Bethe lattice. In all studied topologies the phase transition exists. We calculated 
the critical exponents in all the cases. For the critical exponent corresponding to 
appropriately defined susceptibility we found super-universal value. 
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1 Introduction 



The conceptual simplicity and readiness for mathematical description, along 
with obvious practical relevance makes the Darwinian evolution a prominent 
subject of theoretical biology. Moreover, the formalisation in terms of stochas- 
tic processes reveals close relations to many problems studied before in the- 
oretical physics. Therefore, it is natural to look for physical tools which may 
answer biologists' questions about natural evolution. A lot of effort was de- 
voted to this area recently [1-9]. 

The process of the biological evolution consists in three steps, though two of 
them are simultaneous: reproduction >—> mutation — > selection. The impor- 
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tant thing to note is that the biological fitness function which denotes individ- 
uals' ability to produce viable offspring is dependent on their phenotype. On 
the other hand, the mutations occur in the genotype — information stored in a 
sequence of DNA. The overwhelming complexity of the fitness function, which 
assigns to each microscopic genotype corresponding macroscopic phenotype, 
makes the theoretical treatment of evolution an extremely complicated task. 

Simplifications of the problem are necessary. One of the most prominent 
schemes was developed within the quasispecies model of the biological evolu- 
tion, introduced by Eigen [10]. It showed to be plausible and very useful when 
investigating the mechanisms of microevolution. In the language of physics 
we can roughly speak of diffusion in a complicated potential. Originally it was 
introduced as a model of chemical prebiotic evolution, but it is also reasonable 
to use it for viruses and bacteria which do not haven too complex genome. 
Among the host of other approaches, let us mention at least the model of 
coevolution described by Kauff man's NKCS model [11,12]. 

Very clear and visual presentation of the evolutionary process is to consider 
it as a random walk in the fitness landscape. This is a surface above the 
genome space, where the altitude of each point is determined by the fitness 
attributed to the corresponding genotype. In fact, the fitness landscape is not 
static, as it strongly depends on the ever changing environment, which includes 
interactions with (co)evolving species as well as abiotic influences [13]. The 
movement in the fitness landscape is driven by the mutations and optimal 
genotypes are situated on tops of peaks in the landscape. 

A broad set of different fitness landscapes was used recently to study the be- 
haviour of the evolutionary system. These models included both static [14,15] 
and dynamic landscapes [13,16-18]. They include the sharply-peaked landscape 
(SPL) , the Fujiyama landscape or the holey landscape (HL). Several recent 
reviews summarise various approaches explored [13,19-21]. 

In the Eigen's quasispecies model an individual is characterised by a sequence 
of d nucleotides (or loci) s = {si, s 2 , S3, • • • , s^}. In real situation we should 
consider that there are 4 main nucleotides A, G, C, T(U), so the alphabet 
contains four letters. Two of them are pyrimidines and the other two are 
purines. To make the treatment mathematically simpler, we will use only two- 
letter alphabet {0, 1}. We may interpret this simplification in two alternative 
ways. We can consider putting the digit in the site of the sequence when 
in the same position in the nucleotide acid is a pyrimidine and 1 if there is 
a purine. Or, we can consider a set of loci and for each locus suggest exactly 
two possible alleles, denoted by and 1. 

In every position of the sequence a mutation can occur with a probability fi. 
In our work we consider only point mutations (transversions and transitions 
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or an exchange of alleles in a certain locus) which are only a part of all pro- 
cesses which change genotype and consequently the phenotype. For example 
we neglect duplications which, it seems, are important for our understanding 
of protein interaction networks, see [22-24] . The mutation is then an exchange 
of 1 and in a particular site in the sequence or in a certain locus. The on 
site mutation rate \x is considered to be uniform for all sites of the sequence, 
although this is not the case which we see in fact. It is considered that this 
simplification does not change the results of the model. 

The properties of the evolutionary process, which we are going to compare 
in the article, are the critical coefficients of the error threshold transition. 
The error threshold is the phase transition found in SPL which separates two 
regimes of the quasispecies evolution — the adaptive regime and wandering 
regime. In the adaptive regime the cloud of quasispecies is formed around the 
wildtype, whereas in the wandering regime no wildtype is formed due to a 
mutation load. 

The error threshold phenomenon comes out of the competition between the 
selective advantage of the wildtype a and the mutation load presented by 
the rate and the genome length d. The threshold is characterised by the 
specific value of the selective advantage a c . From the physical point of view, 
the error threshold may be thought of as a dynamic phase transition from 
localised to extended ground-state eigenvector. Recently it was shown [25] 
that coevolution of several species may drive the system close to the error 
threshold. It is therefore highly biologically relevant to study the properties 
of the error threshold transition in detail. 

It is evident that the native geometry of the genome space is the hypercube 
S = {0, l} d . So, this is the natural starting point when building a model for a 
fitness landscape. The main focus will be aimed at the presence or absence of 
adaptive regime induced by a single maximum in the fitness landscape. There- 
fore, the first thing to try is the sharply-peaked landscape on the hypercube. 
This model was solved exactly by Galluccio et al. [26,27]. 

However, the real landscapes are far more complicated. Indeed, it is expected 
that rugged landscapes with many competing maxima represent a realistic 
picture [1]. Fortunately enough, such landscapes are well-known in the theory 
of spin glasses [28] and a "spin-glass" theory of evolution was investigated 
[29]. 

The main result coming from the spin-glass theory is the ultrametric structure 
of the minima in the free-energy landscape of spin glasses, which can be trans- 
lated into ultrametric or hierarchical structure of maxima in the corresponding 
fitness landscape. 

Somewhat more coarse-grained description of the evolution may arise, if we 
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assume the ultrametric structure of fitness maxima, fndeed, we may think of 
the evolution as a hopping process between various peaks, where the system 
spends most of its time, while the detailed course of the hops, composed of 
many individual point mutations, can be accounted for phenomenologically 
by tuning appropriately the hopping probabilities. 

Here, we will be interested again in the error-threshold transition, now in- 
terpreted in such a way that all but one of the peaks have the same weight, 
while the singled-out peak is higher and corresponds to the "wildtype" of the 
sharply-peaked landscape. Therefore, the treatment can be very similar to 
SPL from the mathematical point of view, but differs significantly in that now 
we take into account, at least approximately, the ruggedness of the real fitness 
landscape. 

Yet another approach to the modelling of the fitness landscape is possible, 
namely considering the holey landscape. Indeed, most of the point mutations 
which may occur at the basic level of the evolutionary picture are lethal for 
the individual. Therefore, the hypercube does not represent a good approx- 
imation to the evolutionary dynamics, because only few of the hypercube's 
edges represent paths to possible new genomes. A sparsely connected set of 
points selected at some of the hypercube corners is perhaps better choice. 

The problem resembles the study of topologically disordered solids, where 
the random network of bonds is often well modelled by the Bethe lattice. 
Therefore, third geometry we investigate in our work is that of the Bethe 
lattice. Again, as in the previous cases, one of the points will be considered 
as the "wildtype" with larger fitness, while all other points will have uniform 
fitness (smaller than the wildtype). 



2 General scheme for sharply-peaked and related landscapes 

In the quasispecies model, evolution of the size Pi of an asexual population for 
the genome represented by the lattice site % is governed by the equation 

Pi(f) = E^i(*) • (!) 

j 

describing a Markov process in the genome space. 

The quantity pi will represent the relative population size, or it will be propor- 
tional to the probability of a certain individual to belong to the population in 
the site i. In so doing, we tacitly assume to work within the infinite population 
limit. 
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The time evolution may take place in discrete time; in this case the derivative 
corresponds to p(t) = p(t+ 1) —p{t) . However, in our work the continuous-time 
description will be used, p(t) = ^jp-, unless explicitly stated the opposite. 

The dynamical matrix can be written in the form = Qij + 5{j(Ai — D), 
where Aj is proportional to the fitness of the genome % and so it represents a 
reproductive potential of the quasispecies. The mutation matrix represents 
the rate of mutation from the point % in the genome space to another point 
j. The diagonal elements of the matrix Qu are the probability rates of the 
perfect replication of the individual i. We may also introduce the homogeneous 
external stress which conserves the population size, through the decay rate D. 

The dynamical matrix T^- is supposed to be symmetric. This allows us to 
apply the common machinery of Green functions. Indeed, the main object of 
interest will be the resolvent (Green function) of the dynamical matrix 

0(0 = ^ (2) 

This formalism is particularly suitable for the case of the sharply-peaked 
landscape. Indeed, we suppose that the dynamical matrix can be decomposed 
as T = T° + V, where T° corresponds to the flat landscape of neutral evolution 
and the perturbation V describes the presence of the peak localised at point 

i = l, 

Vy = a5ijSn (3) 



Indeed, supposing that we are able to fully solve the neutral evolution, i. e. to 
find the "free" resolvent Q° = (( — T ) -1 , the error threshold is closely related 
to the existence of the split-off pole in the resolvent G(Q- For the position of 
the pole z we have the equation (Koster-Slater condition) 

- = G(z) (4) 
a 

where G(() = <?n(C) is the diagonal matrix element of the free resolvent at 
the site of the peak. If the equation (4) has a real solution, there is a split-off 
pole, the stationary state is localised and we are in the adaptive phase of the 
evolution. On the contrary, the absence of a real solution indicates that the 
stationary state is extended and we are in the wandering regime. 

Our strategy in the following will consist in calculating the free resolvent for 
various models of the genome space and then analysing the solution of the 
equation (4). To start with, we reexamine the previously obtained results on 
the hypercube. 
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3 Exact solution on the hypercube 



In the article [26] Galluccio et al. investigated the evolution of the quasis- 
pecies in the sharply-peaked landscape on the hypercubic lattice. The model 
is exactly soluble. Let us start by re-derivation of their main results. 

The following assumptions are made in [26]: 

(1) The topology of the genome space is the hypercube. 

(2) The evolution takes place in discrete time, i.e. ft = p(t + 1) — p(t). 

(3) At one time step, only single point mutation occurring with probability 
fi is allowed. Therefore, the mutation matrix connects only the nearest 
neighbours along the edges of the hypercube. 

(4) The fitness landscape is flat with only a single point which takes the 
additive selective advantage a > (sharply-peaked landscape). 

The consequence of allowing single point mutations in discrete time evolution, 
is that they had to restrict a mean number of mutations per genome — ge- 
nomic mutation rate ~p = fid — to be less than 1. This need not to be valid 
in general, for example in natural viral populations, as is shown in paper [30] 
where the genomic mutation rates are estimated for measles virus, poliovirus, 
VSV and rhinovirus. The values are in the range from 0.13 to 1.15 mutations 
per replication. In other work [31] the genomic mutation rates were estimated 
in the range from 0.475 to 4.28 mutations per replication. This is the reason 
why we decided to use a continuous time approximation. The value /i can be 
easily found if we know the length of the life cycle and the mutation rate per 
site per replication. We took the rate from mentioned articles. The "lifetime" 
of measles virus or more precisely the time required for production of a new 
burst of viral particles is in the order of hours or days. So, for example, the mu- 
tation rate in RNA virus is around fi = 10~ 4 per site per generation [30]. With 
the considered generation time 20 hours we get approximately /i = 1.5 x 10~ 9 
per site per second. The presented models of quasispecies have to respect the 
values of /i and the length of the sequence which is in order of kilobases. For 
instance, d = 15,894 bases for the measles virus [30]. Therefore, we choose 
the continuous-time description of the evolution process instead. The rest of 
the assumptions listed above remain valid. Let us proceed with formalisation 
of the above assumptions. 

The genome space is the rf-dimensional hypercube, S = {0, l} d . The points 
in the genome space are the sequences of base pairs i = [ii,i2, ■■■,id] £ <S- Let 
us denote i{k) G S the sequence which differs from i by the point mutation 
at k-th base pair, {i{k))i — i\ + (1 — 2i[)5ki- Then the dynamic matrix of the 
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neutral evolution corresponds to the diffusion in the hypercube 



T ij = 5 j m - V-dSij (5) 



k=l 



The sharp peak located at the wildtype sequence = [0, ...0] G S contributes 
by the term expressed by Eq. (3). 

The general condition (4) assumes very simple form, when we use the Fourier 
transform in the space S and divide the space with respect to the Hamming 
distance h from the wildtype. The number of sequences in each Hamming 
distance class is given by the binomial coefficient (fj . We obtain 




hi z + 2/i/i ^ 



In the limit 2 d — > oo the calculation further simplifies. The point is that 
the binomial distribution of classes reduces in that limit to a very thin and 
high band which we approximate by a 5-function in the mean value, (f) ~ 
2 d 5{h-d/2). 

However, we should be aware that this approximation holds only for z not too 
close to 0. On the other hand, the error threshold itself is characterised by 
the fact that the solution z of the equation (6) approaches to 0. Therefore, we 
have to consider in (6) separately the term with h = 0, while the rest of the 
sum, for h > 0, is approximated by the 5-function as hinted above. Hence, the 
approximate form of the Eq. (6) isQ 

1 1 1 2 d -l 1 

— = 1 . (7) 

a 2 d z 2 d z + fid w 

The approximation we use here goes along different line than the one used in 
[26] but it gives the same results in lowest order. 

The solution z of Eq. (7) is 



^la-fjd + J(a- fid) 2 + -^J 



3 we have done numerical solutions of (6) and (7) for d = 44 and even for such a 
short sequence the approximative formula fits very well. So, we expect very good 
agreement for the genomes with the length comparable to e. g. the measles virus's 
genome. 
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Now we perform the limit d — > oo. It requires some care, as the product of \id 
must stay finite, otherwise diagonal elements of the transfer matrix diverge. 
So we fix p = fid constant. Performing the limit we have to distinguish two 
different cases, 



1 2aa 

z — 73 

2 d \i - a 

z = a — p 



for (a — /i) < 
for (a — p) > 0. 



(9) 



As we see behaviour of the largest eigenvalue z changes sharply in the point 
a c = p. Here we recognise the error threshold. 

Our next aim is to calculate the susceptibtiity of the quasispecies to a change of 
the sequence length. Elongation of the sequence increases a number of possible 
sequences iV = 2 d . The susceptibility, then, shows us how the largest eigen- 
value and thus a relative fitness of the wildtype changes with the "volume" of 
the sequence space. 

We differentiate (8) with the respect to N. Then the limit iV — > oo is done. 
Defining 

(J 7 

x = £5L*im (10) 
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we get in the vicinity of the error threshold 



X = -T^- l (11) 

\(X — CtA 



Now we introduce critical exponents (3 , 7 . They characterise behaviour of 
the largest eigenvalue z and the susceptibility, respectively, close to the error 
threshold, from the left (— ) and right (+) side. In the vicinity of the error 
threshold we assume the critical behaviour 

a < a c z ~ A r ~ 1 |a — a c f x ~ [a — a c | 7 , 

a > a c z ~ I a — a c f + \ ~ \ a ~ a c| 7+ • (12) 

From (9) we see that z behaves close to the error threshold as a power function 
with the exponents (3~ = —1 and fi + — 1, and the Eq. (11) implies that 
7~ = 7 + = —1. 



4 The Quasispecies Evolution in an Ultrametric Space 



The sharply-peaked landscape is certainly an oversimplified approximation. It 
describes the adaptation close to a single selected peak within the true fitness 
landscape. Indeed, in reality the landscape is rugged and consists of very many 
concurrent peaks. In order to investigate their combined effect, we must turn 
to some model scheme. 

It was realised quite a long ago that complicated free-energy landscapes with 
many minima are characteristic for disordered spin systems, namely spin 
glasses [28] . This analogy was brought to biology within the spin-glass model 
of evolution [29]. The main result of the spin-glass theory, which will be rel- 
evant for us here, is the tree-like (or ultrametric) structure of minima in the 
free-energy landscape [28,32]. We therefore assume that the same ultramet- 
ric structure of peaks will be characteristic for the rugged fitness landscape. 
Then, we will observe the evolution on a "mesoscopic" time-scale, where a 
single event will be the jump from one peak to the other. Clearly, it consists 
of many "microscopic" events, i. e. point mutations in the DNA sequence. 

The evolution takes place in the space S constructed as the set of endpoints 
of a regular three with K levels, where at each level every branch splits into 
p new branches. The ultrametric tree is illustrated in the left panel of Fig. 2. 

From the mathematical point of view, the ultrametric structure is described 
through p-adic numbers and we will use the p-adic Fourier transform as the 
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Fig. 2. (Left) The ultrametric tree with three levels and with the branching 2. Below 
are written 2-adic numbers. (Right) Division of an arbitrary tree in K levels with an 
imperfect branching p —> 1 + . The difference p — 1 is proportional to the probability 
of branching in an arbitrary level. 

main tool for the solution. The ultrametric space is represented by the set 
S = {0, 1, 2, ...,p K — 1} endowed with ultrametric distance between each pair 
of points, denoted \i — j\ p for i,j e S. We refer the reader to the Appendix 
for necessary definitions and formulae. The diffusion in the ultrametric space 
was already thoroughly studied before in the context of spin-glass dynamics 
[33-38] and the concept of p-adic Fourier transform was developed quite long 
ago in the mathematic literature [39], and was used implicitly (see e. g. [40]) 
but its use in physics was made explicit only recently [41-44]. 

The ultrametric structure will be reflected by the fact that the probability 
of jumps (mutations) from one point to the other will depend only on the 
ultrametric distance between the two points. It is analogous to the hypercube, 
where the mutation probability depends only on the Hamming distance. 

Movement of the individual on the ultrametric lattice is driven by the dynamic 
matrix T which is similar to the one used on the hypercubic lattice. There is 
one important difference — in the previous case jumps only to the nearest 
neighbours were considered, but now we must allow a possibility to jump 
further, otherwise we get evolution only in one isolated branch of the tree. 

The evolution in ultrametric analog of the sharply-peaked landscape is de- 
scribed again by the equation (1) where now the "unperturbed" matrix T° 
reflects the symmetry of the ultrametric space. This means that the value 
of the matrix element T°- depends only on the ultrametric distance between 
the points i and j. The "perturbation" V corresponds to the single site with 
selective advantage. We can write explicitly 

7§ = f if i = j 

T?j= T m if \i-j\ p = p~™ , (13) 

Vij = aSijSio 
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Moreover, we will require that the bare transition matrix conserves probability. 
This fixes the value of the diagonal element, 

f = - £ T ab = - - l)p m T m (14) 

This requirement is not substantial for our treatment, but simplifies somewhat 
the formulae. Especially it has the consequence that the matrix T° has an 
uniform eigenvector (1, 1, 1) with corresponding eigenvalue equal to 0. 

Thus, the fitness landscape is fully described by the sequence of K numbers 
T ,Ti, ...,Tk-i and the parameter a. As is the probability of hopping to 
the distance p~ m , it is natural to require that 

T° +1 > T° (15) 



To follow the general scheme of section 2 we should first find the unperturbed 
resolvent G°(C) = (C — T ) -1 . So, we need to diagonalise the matrix T°. 

To this end we use the p-adic Fourier transform. For details we refer the reader 
to the Appendix. The basic result is, that the eigenvalues of the matrix T° are 
T m , m = 0, 1, K, where 

K-l 



T =f + Y.TiP^ip-l) 

1=0 

f m =f+Y / T l p K - l -\p-l)-T m „ 1 p K - m , m = l,...,K. 



l=m 

From the conditions (14) and (15) we can immediately deduce that To = is 
the maximum eigenvalue, all other eigenvalues are negative and T^ + ± < T^. 
Using the inverse p-adic Fourier transform we get for the unperturbed resolvent 
the expression 

G{z) F^ + g~^p ' (17) 

Introducing the normalised density of states (DOS) as 

V{f) = h8(f) +j:p^ 1 (p - l)5(f - T})) (18) 

iV 3=1 

we can write the Koster-Slater condition (4) in general form 

i -=im . d9) 

a J z-T 
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Clearly, the factor p> 1 (p — 1) is the multiplicity of the eigenvalue Tj. 

Let us come back to biological aspects. Each possible quasispecies has got a 
certain number of alleles, (or sets of subsequences) but we can not consider 
that every combination of those alleles forms a quasispecies which is able to 
survive. In fact, the ultrametric tree need not be regular but bears some level of 
randomness. The ultrametric space considered so far was regular, characterised 
by fixed branching ration p. We shall take into account the randomness by the 
following approximate procedure. 

Let us take an arbitrary tree, maybe random. With the number of distinct 
end-points iV = p K kept fixed we enlarge the number of levels, K — > oo. We 
can freely do so, as the definition of levels is arbitrary: if no branches appear at 
certain level, we simply have p — 1 locally. With this definition, p may assume 
different values at different levels as well as at different branches within the 
same level. The randomness of the tree is then translated into the set of ran- 
dom p-s. We illustrate the situation in the Fig. 2. The approximation consists 
in characterising the tree by an average value of p, which now may be any 
real number larger than 1. Furthermore, we will assume that the properties of 
trees with non-integer values of p can be approximately calculated by analyt- 
ical continuation from the results obtained for regular, non-random trees and 
arbitrary integer p. Moreover, we should keep in mind that with N fixed and 
diverging number of levels K — > oo we should make the limit p — > 1 + . At the 
end of the calculations, we will make also the thermodynamic limit iV — > oo. 

Therefore, we take the formulae obtained before for general p and suppose the 
variable p is now a real number. Then, we will perform the limits 

first: K — > oo, p — > 1 + with N = p K fixed 

next: N -> oo ^ ' 



Now we should specify the form of the jump rates T m . The rate T m of a 
certain individual to mutate to some other with the sequence which is in the 
ultrametric distance p~ m from the original one will be considered in the form 

T m = tiV {m - K)T ■ (21) 

We solve this case for the value of 2 > r > 1. We will see later that this is 
indeed the interval relevant for the error-threshold phenomenon. 

The choice of an arbitrary power r allows simulation of a quite general situa- 
tion. In the limit p — > 1 possible mutation rates to distinct sites are in a very 
wide range, starting in the value [ip~^ K ~^ >T and ending with the highest pos- 
sible jump rate fi. And this is exactly, in our opinion, what we find in reality 
- a probability rate of the jump from one codon sequence (or a certain set of 
alleles) to other is not uniform. 
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In order to solve the equation (19) we need to compute the density of states 
(18). The eigenvalues of T can be easily found using the Fourier transform. 
A value of the diagonal element T is again established with the help of the 
condition that the sum of elements of the matrix's column equals zero, i. e. 
T = 0. 

^ = -- -^TP- {T - 1)K ( P (T - 1)m { 1 + tu~\ ) ~ A (22) 



The density of states T>(T) will be calculated in the limit (20). Its support 
can be found easily by computing the lowest eigenvalue T K , which in the limit 
(20) is equal to = —fzj- Substituting x = p m into (22) and inverting the 

dependence we obtain the function x(T), hence 



vif) = I 



dx 



dT 



r - 1 \ fir 



(-T ) T - x (23) 



for f G (-£t, 0) and V(f) = outside this interval. 
The equation for the split-off eigenvalue z now becomes 

(r — 1) fj, y 



a Jo z + y 



[" ^—^y (24) 

Jo z + y 



where we denoted 



and 



(25) 



p = fh (26) 

The integral on the RHS of (24) is, for non-integer e, 

*j?L d!/= v__^ + ^-i£_j_(_£y . (27) 

o z + y e sm en ' k + 1 — e \ \i J 

(For integer values of e the infinite series becomes a finite sum accompanied 
by a logarithmic correction). When analysing the equation (24) we must dis- 
tinguish two regimes. For < e < 1 the leading contribution in (27) is of order 
z £ , while for e > 1 the leading contribution is linear in z. The former regime 
applies for 2 > r > | while the latter holds for 1 < r < |. This gives for the 
critical exponent at the error threshold 

j3 + = max(l, ■ (28) 
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The location of the error threshold is determined by the first term on the RHS 
of the Eq. 27 



OL c = y—j (29) 

We can see that the error threshold vanishes in the limit r — > 2 and diverges for 
t — > 1 which justifies the choice of r within the interval (1, 2). We investigated 
also the case r = 1, where we send /i — » simultaneously with iV — > oo with 
/2 = /zlniV kept fixed. In this case we also observe the error threshold for 
certain value of p,, with exponent f3 + = 1. 



5 Evolution on Bethe lattice 



Another approach which takes into account the complicated structure of real 
fitness landscapes uses the so-called holey landscape. This concept reflects the 
fact that most of the random point mutations which may occur are lethal and 
the evolution ends in a dead end there. So, we may start with the hypercu- 
bic lattice {0, l} d as in section 3 but leave only viable sites, while all lethal 
genomes are deleted. The remaining lattice has highly random geometry and 
the symmetry properties of the hypercube, which enabled us to use the Fourier 
transform, are no more applicable here. Thus, another scheme of the solution 
should be looked for. 

In fact, topologically disordered lattices are widely studied in solid-state the- 
ory. The Bethe lattice is a well-known model of random lattices. It has got that 
large advantage that many problems can be solved analytically. It is widely 
used in statistical physics, see [45]. 

We take the infinite Bethe lattice with the connectivity k. In each site there is 
a population of individuals. With the uniform probability rate \i an individual 
may jump (mutate) to one of k nearest neighbours' site. Again one site with a 
selective advantage a is present and we will study creation of the quasispecies 
around that site. The unperturbed dynamical matrix T° corresponds to the 
topology of the Bethe lattice. The off-diagonal terms will be T°- = \x if j and 
% are neighbours on the lattice and Tf- = elsewhere. The diagonal terms on 
a lattice with connectivity k are obviously T° = —k/i for all i. 

Following the general scheme of Sec. 2 we calculate the diagonal element of the 
resolvent Q° = (( — T )" 1 at the selected site. As in the infinite Bethe lattice 
all sites are equivalent, we can calculate the diagonal element at any site inside 
the Bethe lattice. This will be done using the partitioning (projector) method. 

The procedure is based in splitting the Bethe lattice into separate parts. This 
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Fig. 3. The Bethe lattice with the connectivity k = 3 and the separation of branches 
done during the partitioning. The first partitioning takes part in the site 1, the 
second one in one of the sites 2. 

will be done in two steps, as illustrated in Fig. 3. We calculate the diagonal 
element G(() = Gn(() at site 1. First, we separate the central site 1 from the 
rest of the lattice. Because there are no loops, the rest of the Bethe lattice 
now consists of k disconnected and identical infinite branches. The branches 
start at terminal sites denoted by 2. 

Let P be the projector to the site 1 and Q = 1 — P the complementary 
projector, which separates the rest of the lattice. Then, we get for the diagonal 
element of the resolvent 



G(Q = PG(()P = 



where the self-energy has the form 



C _ PT op_ s(C) > 



E(C) = PT°Q 



Q 



c _ T o 



QT°P 



(30) 



(31) 



By extracting the site 1 the lattice consists of k identical parts. So, E(£) is a 
sum of k identical contribution, each of them coming from one isolated branch. 
We can write 



E(C) = /^r(C) , 



(32) 



where T(Q is the matrix element of the projected resolvent (( — QT°Q) 1 at 
any of the terminal sites denoted by 2 in Fig. 3. 

The next step of the partitioning takes one branch and separates the terminal 
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point 2 from it. As a result, k — 1 disconnected branches arise. Their terminal 
points are denoted as 3 in Fig. 3. An equation analogous to (30) is found also 

for r(c). 

We may iterate the same procedure arbitrary number of times. But if we 
consider infinite Bethe lattice, the branch starting with site 2 must be identical 
to any of the branches starting at site 3. This leads to closure in the relation 
for the quantity r(£). 



m ~ t+»k-^(k-ino ■ (33) 



Solving the equations (33) and (30) we find the function G((). From the two 
roots of the quadratic equation the proper one is chosen by the requirement 
that the resolvent must have correct asymptotic behaviour G(C) ~ 1/C f° r 

id -oo. 

However, one complication arises here. If we calculate the density of states 
2^(0 — ^hm e ^ ImG ? (C — is), we observe for k > 2 a gap arising between 
the upper edge of the density of states and the point ( = 0. This reflects the 
well-known pathology of the Bethe lattice, which stems from the fact that 
the surface points of the Bethe lattice constitute finite fraction of the whole 
even in the thermodynamic limit, unlike lattices embedded in <i-dimensional 
Euclidean space, where the fraction of the surface vanishes as N' 1 ^ when the 
number of sites N goes to infinity. When investigating the diffusion on the 
Bethe lattice, it has the unnatural consequence that the probability density 
flows out constantly toward the surface and tends to zero in any finite portion 
of the lattice. In order to compensate for this unnatural effect, we put uni- 
form probability flux into the lattice. In the context of biological evolution, 
this amounts simply to addition of a constant to the fitness of all genomes. 
Such an uniform change dos not affect the natural selection and evolutionary 
competition, but simplifies the mathematical treatment. 

Indeed, introducing the uniform probability influx amounts simply to shifting 
the variable ( ( + fik — 2/iy/k — 1 so that the upper edge of the density of 
states is located at £ = 0. The shifted resolvent is then 

G(C) = , =— • (34) 



(k - 2)(C + 2fiVk~^T) + k^C + 4/iCv^I 

The real and imaginary part of the shifted resolvent are shown in Fig. 4. 
Now we can turn to the solution of (4) using the expression (34). Since we are 
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Fig. 4. Real (full line) and imaginary (dashed line) part of the shifted resolvents for 
k = 3 and k = 10. 



looking for the real eigenvalue z > we obtain 

z = \ (2-k)a- - 1(1 + k^a 2 + 4/x 2 



(35) 



This solution is valid only above the error threshold: a > a c = /i(k — 
2)/y/k — 1. For smaller a the equation (4) have no real solution, there is no 
localised state and the largest eigenvalue of the dynamic matrix is z = 0. 

So we have again found that there is for k > 2 the error threshold between 
the random wandering regime and the adaptive regime at some positive value 
a = a c . For k = 2 the error threshold disappears, a c = 0. However, this case 
is of little interest for us now, because the Bethe lattice reduces to a linear 
chain, hardly being a good approximation for the holey fitness landscape we 
started with. 
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Close to the the critical point, we obtain that for k > 2 the largest eigenvalue 
z behaves as a second power of the distance from the error threshold 



z ~ — (a — a c ) 2 , when a — > a^. (36) 

So we have got the critical exponent /3£ = 2. This reflects the behaviour of 
the density of states at the band edge, which is T>(() ~ ICI 1 ^ 2 f° r an y A; > 2. In 
fact, we obtained the relation between the exponent f3 + and the behaviour of 
the density of states at the band edge already in Sec. 4 when we investigated 
the evolution in ultrametric space. 



6 Super-Universality of the Critical Exponent 7 



In all previous cases the critical exponent f3 + > has been found. In this 
section we show that if we consider power behaviour of the largest eigenvalue 
z near the critical point a c , we can find the value of the critical exponent 7 + . 

Assume that the whole sequence space contains N distinct points. Then, there 
are exactly N eigenvalues of the matrix T. It is known that the largest eigen- 
value of the unperturbed dynamic matrix T° is non-degenerate. If N is very 
large, we can approximately write for the matrix element of the resolvent 
Cr(C) — G(C) corresponding to the diffusion on the lattice with N sites that 

— TV — 1 11 

G(0 = —G(Q + -- . (37) 

We want to know what is the dependence of the split-off pole z on the genome 
size N. It is described by the susceptibility defined in (10). 

The Koster-Slater condition written as 

G{z) = -, (38) 
a 

which corresponds to (4), can be regarded as an implicit function z(N), so 
we use the formula for the derivative of the implicit function in order to get 
the susceptibility x defined in (10). In the thermodynamic limit N — * 00 the 
expression simplifies and we find that the susceptibility of the quasispecies to 
the change of the sequence length is 

X = - Z ■ (39) 

dz 
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As a next step, in order to find 7 + , we suppose that in the vicinity of the error 
threshold the highest eigenvalue can be replaced by the term z ~ q(a — a c ) /3+ , 
where q is an appropriate coefficient. So, close to the error threshold 



(40) 




hence 



fal r / a -i \ 

X = , for a — > a c (41) 



which corresponds exactly to (11). The value of the critical exponent 7+ = 
— 1 follows, and we see that the exponent does not depend on the model in 
question. This proves the super-universality of the exponent 7 + . 



7 Conclusions 



We investigated the transition between adaptive and neutral phase in biologi- 
cal evolution, which occurs at the error-threshold value of the mutation rate. 
We suggested to take into account the complicated structure of the fitness 
landscape by effectively choosing non-trivial topology of the genome space. 
In all cases we observed the formation of a quasispecies around a site with 
selective advantage. 

We compared the evolutionary process in three different topologies. The first 
one was the hypercube. (We re-derived by a different method the already 
known results and completed the study by calculating the susceptibility, not 
studied before.) This corresponds to standard sharply-peaked landscape, with 
completely flat landscape as a background. Second, inspired by the theory of 
spin glasses, we modelled the ruggedness of the fitness landscape by taking 
as the basic space the set of peaks hierarchically organized in an ultrametric 
space. Third, we took the holey fitness landscape, in which all viable genomes 
have equal fitness and all lethal genomes are cut off. We approximated it by the 
Bethe lattice, as usual in various problems dealing with randomly connected 
networks in high dimensions. 

The problem was treated as diffusion in the underlying space, the site with 
selective advantage acting as a source. The formation of a localized state 
(quasispecies) around the selected site was studied by observing the behavior 
of the maximum eigenvalue of the dynamical matrix governing the diffusion. 
If the maximum eigenvalue is split off from the rest of eigenvalues (the band), 
the state is localized and we are in the adaptive phase. 
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The split-off sets on at the critical value of the selective advantage, corre- 
sponding to the error threshold. We calculated the value of the critical value 
in all three topologies. We therefore confirmed that the very existence of the 
error threshold phenomenon does not depend on what is the structure of the 
genome space, at least what concerns the types of topology studied here. 

As a measure of the critical behavior we chose the approach of the maximum 
eigenvalue to zero when the selective advantage approaches its critical value. 
We found the behavior be dominated by a power and calculated the critical 
exponent. We found that it is always larger or equal to 1. For the hypercube it 
is 1, for the Bethe lattice it has value 2, while in the ultrametric space it may 
assume any value larger or equal to 1, depending on details of the geometry. 

In order to find the dependence between the largest eigenvalue and the volume 
of the genome space, we calculated the susceptibility of the quasispecies to 
the change of the sequence length. The susceptibility diverge in the vicinity of 
the error threshold. This could be of the partial interest for biologists, since 
natural viral population seems to have mutation rate very close to the critical 
point, see [46]. We found that the divergence of the susceptibility at the error 
threshold is governed by a power-law with super-universal, model-independent 
exponent —1. 

All the topologies which we considered were regular. Our further outlook is to 
study the influence of the randomness in the topology of the genome space to 
the evolutionary process. 
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Appendix 

We will present the basic facts on ultrametric spaces, p-adic numbers and 
p-adic Fourier transform here [32,39,41,42]. 

The ultrametric space is a special kind of a metric space. It is a set U of 
points endowed by a non-negative function expressing the distance between 
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two points in the space. For reasons which will become clear later, in this article 
we will use the notation \a — b\ p for the distance between points a,b eU. 

As usual in metric spaces, we require that the distance from a point to itself 
is zero and distance to any other point is positive. The most important prop- 
erty of ultrametric spaces is that instead of the triangle inequality a stronger 
condition holds, 

\ a — b\p < max{|a — c\ p , \c — b\ p } Va,b,cEU. (42) 

This implies that any triangle is equilateral or isosceles with a small base, 
that every point inside a ball is its centre, or that the diameter equals to the 
radius. More detailed presentation can be found in 



There is a class of the ultrametric spaces formed by p-adic numbers. These 
are defined as sums 

K-1 

a = £ a lV \ (43) 

i=0 

where p and Oj, < aj < p— 1, are natural integers. Clearly a G {0, 1, ...,p K — 
1}. The distance | • \ p is defined as \a — b\ p = p~i where p +J is the largest 
divisor of the number \a — b\ in that form (j is an integer). Note that \a\ p can 
be considered as an (ultrametric) norm of the number a. 

The integral over all p-adic numbers is defined as [39] 

iEn«), (44) 

V a=0 



f T{a)da 
Jp 



where the sum goes over whole set of p-adic numbers. With so defined integral 
we have the Fourier transform of the function T(a) in the form: 

i p k -i 

f(m) = — E e 2 ™T(a). (45) 

P a=0 

m is an element of the reciprocal space and it is in the form m = Y^f=i rriip l ~ K , 
< rrii < p — 1. If the function T(a) depends only on the ultrametric norm, 
i. e. T(a) = f(\a\ p ), and if \m\ p = p k , we get (for k — 1, . . . , K): 

f k = f(m) =f + K Y^T lP K ' l -\p - 1) - T k ^p K - k 

l =_\ (46) 
f = f(0)=f+J2T lP K ' l - 1 (p-l) . 

1=0 
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In the previous formula we defined, T = T(0), and T\ = T(a) when \a\ p = p~ l . 
On the other side for \a\ p — p~ l we get the inversion of the Fourier transform 
in the form (for 1 = 0,... , K — 1): 



Ti = T{a) = p- K f + £ f m p m - K -\p - 1) - f l+l p l - K 

m=l 

f = T{0)=p- K %+j^f m p m - K -\p-l) . 

771=1 



(47) 
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